Corelab Seminar
2010-2011

Paris Koutris (University of Washington, NTUA)
Parallel Evaluation of Conjunctive Queries

Abstract.
The availability of large data centers with tens of thousands of servers has led to the popular adoption of massive parallelism for data analysis on large data sets. Several query languages exist for running queries on massively parallel architectures, some based on the MapReduce infrastructure, others using proprietary implementations. Motivated by this trend, we analyze the parallel complexity of conjunctive queries. We propose a very simple model of parallel computation that captures the searchitectures, in which the complexity parameter is the number of parallel steps requiring synchronization of all servers. We study the complexity of conjunctive queries and give a complete characterization of the queries which can be computed in one parallel step. These form a strict subset of hierarchical queries, and include flat queries like R(x,y), S(x,z), T(x,v),U(x,w), tall queries like R(x), S(x,y), T(x,y,z), U(x,y,z,w), and combinations thereof, which we call tall-flat queries. We describe an algorithm for computing inparallel any tall-flat query, and prove that any query that is not tall-flat cannot be computed in one step in this model. Finally, we present extensions of our results to queries that are not tall-flat.

back